-Looking through the datasets we have available -Need Google Trend data for search terms such as “Cruz” vs “Rourke” or “midterm election” to see if number of searches correlate with # votes for each candidate or # registered voters -What kind of analysis do we want to see -Review the sample final projects
Some links to keep in mind: https://www.cnn.com/election/2018/exit-polls/texas/senate https://www.texastribune.org/2018/10/31/ut-tt-poll-texans-say-immigration-border-security-top-issues/
<<<<<<< HEADlibrary(tidyverse)
## ── Attaching packages ────────────────────
## ✔ ggplot2 3.0.0 ✔ purrr 0.2.5
## ✔ tibble 1.4.2 ✔ dplyr 0.7.6
## ✔ tidyr 0.8.1 ✔ stringr 1.3.1
## ✔ readr 1.1.1 ✔ forcats 0.3.0
## ── Conflicts ──── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
library(rvest)
## Loading required package: xml2
##
## Attaching package: 'rvest'
## The following object is masked from 'package:purrr':
##
## pluck
## The following object is masked from 'package:readr':
##
## guess_encoding
library(httr)
library(ggridges)
##
## Attaching package: 'ggridges'
## The following object is masked from 'package:ggplot2':
##
## scale_discrete_manual
=======
>>>>>>> 251a2d6ca7820e32c1c5aee20c1fd898c8489465
url = "https://www.nytimes.com/elections/results/texas-senate"
nytimes_data = read_html(url, col_types = "ccdd")
nytimes_data
## {xml_document}
## <html lang="en" itemscope="" xmlns:og="http://opengraphprotocol.org/schema/" itemtype="http://schema.org/NewsArticle">
## [1] <head>\n<title>Texas Senate Election Results: Beto O’Rourke vs. Ted ...
## [2] <body class="eln-race-page eln-2018-11-06 eln-forecast">\n<script ty ...
nytimes_data %>%
html_nodes(css = "table")
## {xml_nodeset (2)}
## [1] <table class="eln-table eln-results-table">\n<thead><tr>\n<th class= ...
## [2] <table class="eln-table eln-county-table">\n<thead><tr>\n<th class=" ...
This seems to have created two tables from the website data.
table_overall = (nytimes_data %>% html_nodes(css = "table")) %>%
.[[1]] %>%
html_table()
=======
This seems to have created two tables from the website data.
>>>>>>> 251a2d6ca7820e32c1c5aee20c1fd898c8489465
This plot illustrates how it was a close race between the top two candidates, O’Rourke and Cruz. As Dikeman had very few votes, we decided to omit Dikeman from further analyses.
Made the first table that which we have final results for the state of texas.
Made the second table which has all of the 254 county level data for Texas!

-using search terms “Midterms” and selecting dataset from top result
<<<<<<< HEADdistrict_searches = read_csv(file = "./data/Search_Data_US_Congressional_District_26Sep2018.csv")
## Parsed with column specification:
## cols(
## .default = col_double(),
## District = col_character(),
## Code = col_character(),
## State = col_character(),
## FIRST = col_character(),
## SECOND = col_character(),
## THIRD = col_character(),
## FOURTH = col_character(),
## FIFTH = col_character(),
## SIXTH = col_character(),
## SEVENTH = col_character(),
## EIGHTH = col_character(),
## NINTH = col_character(),
## TENTH = col_character(),
## `Maternity leave in the United States` = col_integer(),
## `Single-payer healthcare` = col_integer(),
## `Tax Cuts and Jobs Act of 2017` = col_integer(),
## `Transgender people in the military` = col_integer()
## )
## See spec(...) for full column specifications.
TX_searches =
district_searches %>% janitor::clean_names() %>%
filter(state == "TX")
TX_counts =
TX_searches %>%
count(fifth)
#the topics that were searched most were health care, immigration, mental health, united nations
#second most searched: immigration, health care, Medicare, Medicaid, capital punishment
#third most searched: Medicare, Medicaid, September 11 attacks, immigration,
#fourth most searched: Medicaid, Medicare, Immigration...
#fifth: Medicaid, mental health, Medicare, September 11 attacks...
#Thus, we should focus on the variables health care, immigration, Medicare, Medicaid, Mental health, September 11 attacks
congress_district = read.csv(file = "./data/congress_district2.csv")
congress_district$county_name = str_replace(congress_district$county_name,"Sterlin", "Sterling")
congress_district$county_name = str_replace(congress_district$county_name,"MuCulloch","McCulloch")
#misspelled counties discovered while exploring data when merging later on
TX_searches =
TX_searches %>%
separate(code, into = c("remove_1", "district_num"), sep = "-") %>%
mutate(district_num = as.numeric(district_num)) %>%
select(district_num, most_searched = first, x2003_invasion_of_iraq:womens_health)
nested_congress =
congress_district %>%
nest(county_name)
merged_searches= merge(TX_searches, nested_congress, by="district_num", all=TRUE) %>%
unnest()
#rename variable county_name
merged_searches = merged_searches %>%
select(district_num, county=county_name, everything())
merged_nyt_searches = merge(merged_searches, table_county, by= "county", all=TRUE)
Still figuring out how to display this ideas: 1) interactive barchart in the current long format, if we use shiny we could show how the top topics vary among counties through use of drop-down menu to select county, etc. 2) figure out we can juxtapose how the counties voted vs. topics. Use of plotly for interactivity? 3) Alternative to first option, how can we show the distribution of topics among counties instead? 4)focus on 5 biggest counties or districts? but this would be biased as it may be a metropolitan area
<<<<<<< HEADmerged_nyt_searches_long = gather(merged_nyt_searches, key = topics, value = search_interest, health_care:september_11_attacks)
Cleaning data so as to merge with GIS…